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Human adenovirus type 3 (HAdV-3) is a causative agent of acute respiratory disease, which is 
prevalent throughout the world, especially in Asia. Here, the complete genome sequences of two 
field strains of HAdV-3 (strains GZ1 and GZ2) isolated from children with acute respiratory 
infection in southern China are reported (GenBank accession nos DQ099432 and DQ105654, 
respectively). The genomes were 35 273 bp (GZ1) and 35 269 bp (GZ2) and both had a 
G-+C content of 51 mol%. They shared 99 % nucleotide identity and the four early and five late 
regions that are characteristic of human adenoviruses. Thirty-nine protein- and two RNA-coding 
sequences were identified in the genome sequences of both strains. Protein pX had a predicted 


molecular mass of 8-3 kDa in strain GZ1; this was lower (7-6 kDa) in strain GZ2. Both strains 
contained 10 short inverted repeats, in addition to their inverted terminal repeats (111 bp). 
Comparative whole-genome analysis revealed 93 mismatches and four insertions/deletions 
between the two strains. Strain GZ1 infection produced a typical cytopathic effect, whereas strain 
GZ2 did not; non-synonymous substitutions in proteins of GZ2 may be responsible for this 


difference. 


INTRODUCTION 


Adenoviruses (AdVs) are responsible for 5-10 % of lower 
respiratory tract infections in infants and children and infect 
a very broad spectrum of hosts, including cattle, duck, 
possum, dog, tree shrew, fish, frog, corn snake, and equine, 
ovine, porcine and simian animals (Davison et al., 2000; 
Farkas et al., 2002; Kovacs et al., 2003). They can be divided 
into four genera, Atadenovirus, Aviadenovirus, Mastadeno- 
virus, Siadenovirus, and unassigned species (Benkő et al., 
2000, 2002; Shenk, 2001; Davison et al., 2003; Kovacs et al., 
2003). Since the first human adenovirus (HAdV) was 
isolated (Rowe et al., 1953), 51 different HAdV serotypes 
have been identified within the genus Mastadenovirus and 
they can be classified into six species (HAdV-A to -F) based 
on a variety of parameters, including oncogenicity in 
rodents, electrophoretic mobility (Wadell, 1979) and DNA 


The GenBank/EMBL/DDBJ accession numbers for the human 
adenovirus type 3 strain GZ1 and GZ2 sequences reported in this 
paper are DQ099432 and DQ105654, respectively. 


or genome identity (Garon et al., 1973; Green et al., 1979; 
Wadell et al., 1980; Wadell, 1984; De Jong et al., 1999), as 
well as the classical gold standards of serum neutralization 
and haemagglutination-inhibition tests (Davison et al., 
2003). 


HAdV-B species have been divided further into two groups: 
B1, including HAdV-3, -7, -16, -21 and -50, and SAdV-21; 
and B2, including HAdV-11, -14, -34 and -35 (Wold et al., 
1979; Stone et al., 2003). HAdV-B group B1 viruses have 
been isolated from patients with febrile respiratory disease, 
especially fatal acute respiratory disease (ARD) (Hierholzer, 
1995; Erdman et al., 2002). Members of group B2, with the 
exception of HAdV-11a and -14 (Van der Veen, 1963; Mei 
et al., 1998), are associated with persistent infections of 
kidney and urinary tract (Myerowitz et al., 1975; Shields 
et al., 1985). 


Of the 51 HAdV serotypes, about one-third are associated 
with human diseases. Adenovirus infections can occur 
endemically or as outbreaks. HAdV-B group B1 (HAdV-3, 
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HAdV-7 and, less frequently, HAdV-21) and HAdV-4 have 
been the causative agents in epidemic outbreaks of respira- 
tory disease in Europe, America, Oceania and Asia (Herbert 
etal., 1977; Martone et al., 1980; Lewis et al., 2004; Frantzidou 
et al., 2005). Viruses of group B1 (HAdV-3, -7 and -21) can 
occasionally infect tissues of the central nervous system and 
cause aseptic meningitis, meningoencephalitis and encepha- 
litis (Chany et al., 1958; Faulkner & Van Rooyen, 1962; 
Similä et al., 1970). 


Epidemic outbreaks of ARD caused by HAdV-4 and HAdV-7 
among American and Canadian basic military trainees have 
been controlled by the introduction of effective live enteric- 
coated oral vaccines since the 1970s (Dudding et al., 1972); 
however, the manufacture of HAdV vaccines was discon- 
tinued in 1996. HAdV-3, first isolated by investigators in the 
Walter Reed Army Institute of Research from patients with 
ARD at Fort Leonard Wood (MO, USA) in the winter 
of 1952-1953 (http://history.amedd.army.mil/booksdocs/ 
historiesofcomsn/section1.htm), is widely prevalent all over 
the world (Herbert et al., 1977; Martone et al., 1980; Ryan 
et al., 2002; Frantzidou et al., 2005), especially in Asia 
(Itakura et al., 1990; Itoh et al., 1999; Hong et al., 2001; Kim 
et al., 2003; Li et al., 2004). However, no efficient vaccine 
against HAdV-3 has been developed. HAdV infections are 
highly contagious and common in dense and close popula- 
tions, such as military training venues and day-care centres. 
The population of Asia is large and often dense, especially in 
China and Japan. Consequently, epidemics of ARD caused 
by HAdVs occur at high frequency. In July 2004, more than 
200 children from an infant school were infected with 
HAdV-3 in Guangzhou, southern China (Zhu et al., 2005). 


In Asia, multiple HAdV-3 genome types have been identified 
by restriction-enzyme analysis (Itakura et al., 1990; Itoh et al., 
1999; Kim et al., 2003). In China, the dominant genome type 
from 1962 to 1988 was HAdV-3a2, with occasional isolates 
of HAdV-3a4, HAdV-3a5 and HAdV-3a6 (Li & Wadell, 
1988; Li et al., 1996). In Japan, the dominant genome type 
from 1983 to 1991 was HAdV-3a, with occasional isolates of 
HAdV-3a8 and HAdV-3c (Itakura et al., 1990; Mizuta et al., 
1994; Shiao et al., 1996). In Seoul, Korea, six new variants, 
HAdV-3al3 to HAdV-3al18, were found from 1990 to 2000 
(Kim et al., 2003). Genome types may vary by location and 
time of isolation; some genome types may be associated with 
greater virulence (Kajon et al., 1990, 1996). It is therefore 
important to understand the genomics and bioinformatics of 
human disease-relevant HAdVs of group B1. 


Since the first HAdV genome sequence was reported 
(HAdV-2) (Roberts et al, 1984, 1986), the complete 
genome sequences of 21 members of the genus Mastadeno- 
virus have been released, with at least one from each species. 
For HAdV-B, to date, genomes of HAdV-7, -11, -21, -35 and 
-50 have been deposited in GenBank/EMBL (Gao et al., 
2003; Mei et al., 2003; Stone et al., 2003; Vogels et al., 2003; 
Roy et al., 2004; Purkayastha et al., 2005). The complete 
genomic sequence of HAdV-3 has not been reported pre- 
viously. In this report, two complete and annotated genome 


sequences of HAdV-3a, strains Guangzhou01 (GZ1) and 
Guangzhou02 (GZ2), are described (GenBank accession nos 
DQ099432 and DQ105654, respectively); open reading frames 
and non-coding motifs were also analysed and compared. 


Strain GZ1 infection produced a typical cytopathic effect 
(CPE), whereas strain GZ2 did not. The genome organiza- 
tion of both strains is similar to that observed in other 
members of HAdV-B. Bioinformatics provides an insight 
into the biology of HAdV-3 and raised our interest in the 
CPE difference caused by the two strains. The clinical 
application of HAdV-2- and HAdV-5-based gene-transfer 
vectors has been hampered because of pre-existing immun- 
ity against HAdV-2 and HAdV-5, which could affect the 
efficacy and even safety of adenovirus vector administration. 
HAdV-3, unlike HAdVs that do not belong to HAdV-B, 
has no proven affinity for the coxsackievirus—adenovirus 
receptor (CAR) (Roelvink et al, 1998). This receptor 
diversity implies that HAdV-3 has a different tropism from 
CAR-interacting AdVs and could provide an alternative to 
HAdV-5-based gene-transfer vectors (Havenga et al., 2002; 
Sirena et al., 2004). 


METHODS 


Cells and virus strains. HAdV-3 strains GZ1 and GZ2 were iso- 
lated from nasal aspirates of children with clinical evidence of ARD 
in January 2005 and July 2004, respectively. The child from which 
strain GZ2 originated had the symptoms of pharyngeal conjunctivitis 
and the other child had fever and bronchitis. Nasal aspirate speci- 
mens were inoculated into HEp-2, MDCK and HeLa culture tubes 
with an atmosphere of 5 % (v/v) carbon dioxide in Dulbecco’s mini- 
mum essential medium supplemented with 100 IU penicillin ml’, 
100 pg streptomycin ml™* and 2% (v/v) fetal calf serum. The cul- 
ture tubes were observed for 3—4 weeks for CPE and identified by a 
neutralization assay with type-specific reference antisera raised in 
rabbits by conventional procedures (Hierholzer, 1995). Type-specific 
primers designed to the hypervariable regions (HVRs) of the HAdV 
hexon were also utilized to correctly identify HAdV-3, -4, -7 and -11. 
The following primers were used: HAdV-3-specific, HAdV-3S (5'- 
AAGACATTACCACTACTGAAGGAGAAG-3’) and HAdV-3A (5'- 
CGCTAAAGCTCCTGCAACAGCA-3’); HAdV-4-specific, HAdV-4S 
(5'-GGTAGCTGCCATGCCAGGTG-3’) and HAdV-4R (5'-CATA- 
GTTAGGAGTGGCGGCGG-3’);  HAdV-7-specific, HAdV-7S (5'- 
GGGAAAGACATTACTGCAGACAAC-3’) and HAdV-7R (5'-GGC- 
GAAAAAGCGTCAGCAG-3'); and HAdV-11-specific, HAdV-11S 
(5'-AGGAACACGTAACAGAAGAGGAAACC-3') and HAdV-11R 
(5'-TAGCTTCGGAACTTGTGTCTTCTGTT-3’). 


Preparation of viral DNA and genome-type analysis. Virus was 
propagated in HEp-2 cells and viral DNA was extracted by using a 
previously described method (Shinagawa et al., 1983). Purified 
HAdV genome DNA was digested by restriction enzymes (BamHI, 
EcoRI, HindIII, Sall and Smal; TaKaRa). HAdV strains were 
genome-typed by comparing the restriction profiles with those of 
prototype and other genome types described in the literature and 
according to the genome-type denomination system (Li & Wadell, 
1988; Golovina et al., 1991; Li et al., 1996; Kim et al., 2003). 


DNA cloning and sequencing. The restriction fragments of 
HAdV genome DNA digested with HindIII, EcoRI, BamHI or Smal 
were purified with QIAquick Gel Extraction kits (Qiagen) and 
cloned into pBlueScript SK( + ) vectors. The entire HAdV-3 genome 
DNA was resequenced using primers from initially sequenced 
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regions cloned in vectors and various HAdV-3 gene sequences 
archived in GenBank/EMBL. Template DNA (0-1-1-0 ug per reac- 
tion) was further purified by passing through Mini Spin Columns 
(Qiagen). The sequencing reaction was carried out by using an ABI 
Prism BigDye Terminator v3.1 Cycle Sequencing Ready Reaction kit 
with AmpliTaq DNA polymerase on an ABI 3730 DNA sequencer 
(Applied Biosystems). All of the reported sequences are the result of 
at least three sequencing reactions. 


Direct sequencing of inverted terminal repeat (ITR) ends. 
The 5’ and 3’ ends of the linear HAdV-3 genome were sequenced 
directly on an ABI 3730 DNA sequencer (Applied Biosystems) with 
the repurified genomic DNA as templates. Primers were designed 
from newly obtained internal sequences. 


Genome annotation and sequence analysis. The sequences 
were assembled with SEQMAN software from the Lasergene package 
(DNAStar) and SEQUENCHER 4.1.4 (Gene Codes). Genome annota- 
tion provided an additional layer of sequence quality control. 
Unresolved and ambiguous sequences were resequenced with pri- 
mers close to the regions in question. 


General features of the HAdV-3 genome sequences were revealed by 
using the University of Wisconsin Genetics Computer Group (GCG) 
package (SEQWEB v. 2). The genome sequence was annotated with the 
annotation protocol used for HAdV-1 genome analysis (Lauer et al., 
2004) by first dividing the sequence into contiguous 1 kb non- 
overlapping segments. Briefly, these segments were queried system- 
atically against the non-redundant NCBI database using the program 
BLASTX of the BLAST suite of sequence-alignment software (Altschul 
et al., 1990). Default parameters of word size =3 and expectation = 10, 
with the BLOSUM62 substitution matrix and with gap penalties of 11 
(existence) and 1 (extension), were applied to these analyses. Low- 
complexity sequences were filtered out of the queries, as per the BLAST 
algorithm. 


GENSCAN 1.0 and GENOMESCAN were used for theoretical gene pre- 
dictions (Yeh et al., 2001). They were useful for identifying exons from 
the coding sequences where exon-intron borders were difficult to 
determine. Other splice site-finder programs [WISE2 (http://www. 
ebi.ac.uk/Wise2/advanced.html) and SPLICEPREDICTOR (Brendel & 
Kleffe, 1998)] were used to find splice-donor and -acceptor sites with 
the highest score. In parallel, novel sequences or ‘hypothetical proteins’ 


were also identified by using FGENESV, software for predicting potential 
genes in viral genomes (http://www.softberry.com/berry.phtml?topic = 
index&group = programs&subgroup = gfindv) and GENEMARK v. 2.4, a 
Hidden Markov method-based gene-prediction software (Besemer & 
Borodovsky, 1999). In these annotations, although FGENESV had a 
slightly higher accuracy than the others, none of them were completely 
comprehensive or accurate in predicting putative genes. To visualize 
the annotation progress, the genome-annotation and -editor tool 
ARTEMIS was used to expedite genome annotation (Berriman & 
Rutherford, 2003). 


Whole-genome alignment and comparisons of the sequences from 
HAdVs were performed by using the dot-plot software Advanced 
PipMaker (http://pipmaker.bx.psu.edu/cgi-bin/pipmaker?advanced), 
which aligns long genomic DNA sequences quickly and with good 
sensitivity (Schwartz et al., 2000). 


RESULTS AND DISCUSSION 


Confirmation of serotype and genome type 


Typical CPE was found in all cells inoculated with strain 
GZ1, but not with strain GZ2. Both virus strains were 
neutralized specifically by HAdV-3-neutralizing rabbit 
immune serum. Antisera against the closely related 
HAdV-7 cross-reacted only slightly. PCR assay also indi- 
cated that both HAdV strains were serotype 3, for only PCR 
with primers specific to HAdV-3 could obtain a product of 
314 bp. Further genome-typing results of restriction profiles 
made it clear that both strains were genome type HAdV-3a. 


General characteristics of the HAdV-3 genome 
sequence 


The genome sequences of HAdV-3 strains GZ1 and GZ2 
were annotated to identify biological features. This was 
facilitated by using reference genomes from the recently 
determined HAdV-11 (GenBank/EMBL accession nos 
AF532578 and AY163756) and HAdV-7 (GenBank/EMBL 


L1 L3 L4 E3 Ed 
f Nr vo 
ae EIB x pV 20.1K 153K 
6.8K IX a 23K 33k 19.3K 1651 
-> 55K > MLP VA RNA py = > > 165K 
24.7K 423K 4 »>  pllla pVII Hexon 99K 16.1K 10.3K 
-42.3 — > —_—> Ss > + 
28.5K 21K 52/55K Penton base 100K 12.1K 20.5K Fiber 
p oe => — > —> > a 
0 5000 10000 15000 20000 25000 30000 35000 
1 
IVa2 4 
— DNA Pol 42E ORF6/7 ORFt 
= «ORF2 
pTP E2A ORF3 
2 
*orF4 
4- 
E = ORF6 
—— 
E4 


Fig. 1. Genomic organization and transcription map of HAdV-3 strains GZ1 and GZ2. Arrows indicate the locations of coding 
regions. Early and late transcription units are shown with brackets. Abbreviations: DBP, DNA-binding protein; pTP, terminal 


protein precursor; MLP, major late promoter. 
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Table 1. HAdV-3 strains GZ1 and GZ2 genome-sequence annotation and comparison 


Non-coding motifs and coding regions are identified for HAdV-3 strains GZ1 and GZ2. Their proteins and putative functions are indicated. 


The nucleotide positions of the start and stop codons and the applicable splice sites are noted (5'—>3' direction). Functionality embedded 


within the complementary strand and coding sequences transcribed from the complementary strand are indicated by ‘c’, e.g. (3925-3930)c. 


An asterisk indicates that the same position is found in strain GZ2 as in strain GZ1. — indicates that the strain does not include the 


corresponding protein. 


Region Product Genome location 
Strain GZ1 Strain GZ2 
ITR 1-111 = 
DNA Pol-pTP binding site 9-18 = 
E1A TATA box for E1A 480-485 = 
ElA 6:8 kDa protein 576-647, 1248—1349 * 
EIA 28:5 kDa protein 576-1155, 1248-1453 = 
E1A 24-7 kDa protein 576-1062, 1248-1453 = 
E1A PolyA signal for E1A 1492-1497 * 
E1B TATA box for E1B 1547-1552 = 
E1B 21 kDa protein 1601-2137 * 
E1B 55 kDa protein 1906-3384 + 
E1B PolyA signal for E1B 3402-3407 * 
IX TATA box for IX 3382—3387 = 
IX Hexon-associated protein IX 3478-3894 * 
IX PolyA signal for IX 3907-3912 $ 
IVa2 Maturation protein IVa2 (3946-5279, 5558—5570)c ye 
E2B PolyA signal for E2B (3925-3930)c $ 
E2B TATA box for MLP 5870-5876 a 
E2B DNA polymerase (5049-8540, 13847-13855)c bj 
E2B Terminal protein precursor (pTP) (8420-10387, 13847-13855)c b 
VA RNA VA RNA I 10419-10588 = 
VA RNA VA RNA II 10671-10839 ii 
L1 52/55 kDa protein 10868-12025 * 
Ll Protein IIa precursor 12050-13816 d 
L1 PolyA signal for L1 13829-13834 s 
L2 Penton base protein HI 13904-15538 8 
L2 Protein VII precursor 15552-16130 15550-16128 
L2 L2 minor core protein pV 16173-17225 16171-17223 
L2 pX 8:3 kDa protein 17254-17481 z 
L2 pX 7:6 kDa protein = 17252-17461 
L2 PolyA signal for L2 17500-17505 17498-17503 
L3 Protein VI precursor 17557-18309 17555-18307 
L3 Hexon (protein I) 18422-21256 18420-21254 
L3 23:7 kDa protease 21293-21922 21291-21920 
L3 PolyA signal for L3 21942-21947 21940-21945 
E2A PolyA signal for E2A (21954-21959)c (21952-21957)c 
E2A DNA-binding protein (22009-23559)c (22057-23557)c 
L4 100 kDa protein 23590-26064 23588-26062 
L4 22 kDa protein 25766-26365 25764-26363 
L4 33 kDa protein 25766-26234, 26284-26639 25764-26232, 26282-26637 
L4 pVII 26709-27392 26707-27390 
E3 12-1 kDa protein 27392-27712 27390-27710 
E3 16-1 kDa protein 27666-28106 27664-28104 
E3 19-3 kDa protein 28091-28609 28089-28607 
E3 20:1 kDa protein 28639-29178 28637-29176 
E3 20:5 kDa protein 29191-29760 29189-29758 
E3 10-3 kDa protein 29994-30269 29990-30265 
E3 16:5 kDa protein 30241-30678 30237-30674 


Comparative genomic analysis of HAdV-3 


Table 1. cont. 


Region Product Genome location 
Strain GZ1 Strain GZ2 

E3 15-3 kDa protein 30671-31081 30667-31077 
E3 PolyA signal for E3 31114-31119 31110-31115 
CDS U exon 31125-31286)c (31121-31282)c 
L5 Fiber protein 31301-32260 31297-32256 
L5 PolyA signal for L5 32268-32273 32264-32269 
E4 PolyA signal for E4 32285-32290)c (32281-32286)c 
E4 E4 ORF6/7 (32301-32552, 33275-33448)c (32297-32548, 33271-33444)c 
E4 E4 ORF6 32549-33448 )c (32545-33444)c 
E4 E4 ORF4 33351—33719)c (33347-33715)c 
E4 E4 ORF3 33728-34081 )c (33724-34077 )c 
E4 E4 ORF2 34078—34512)c (34074-34508)c 
E4 E4 ORF1 34509-34886 )c (34505-34882)c 
E4 TATA box for E4 34967—34972)c (34963-34968)c 

ITR 35163-35273 35159-35269 


accession nos AY594255 and AC_0 00018) prototype strains. 
Like other members of the genus Mastadenovirus, the 
HAdV-3 genome is organized into early, intermediate and 
late transcription regions (Fig. 1). The strain GZ1 genome 
was 35 273 bp in length and had an overall base composi- 
tion of 25-36 % A, 25-69 % C, 25-31 % Gand 23-64 % T. The 
G+C content (51:0 mol%) was within the 50-52 mol% 
range noted in the literature for HAdV-B (Jin, 2001). Strain 
GZ1 DNA had an M, of 2-1 x 10’, determined from its base 
composition. The strain GZ2 genome was 35269 bp in 
length and had nearly the same composition as strain GZ1. 
Thirty-nine protein-coding sequences and two RNA-coding 
sequences were identified in the genome sequences of both 
strains, including the pX protein (with predicted molecular 
masses of 8:3 kDa in strain GZ1 and 7:6 kDa in strain GZ2). 
Functionally, other non-coding features, such as promoters 
and transcription factor-binding and -recognition sites, were 
conserved between the two strains, as shown in Table 1. 


Comparison and analysis of inverted repetitive 
sequences 


Analysis of the two HAdV-3 strains revealed several inverted 
repetitive sequences (Table 2). The perfect 111 bp inverted 
terminal repeats (ITRs) at either end of the genome were 
identified in both strains. The entire HAdV-3 ITR sequences 
were very similar to the 108 bp ITR of the closely related 
HAdV-7 (strain Gomen) (Purkayastha et al., 2005) and the 
114 bp ITR of HAdV-50 (strain Wan) (GenBank/EMBL 
accession no. AY737798), apart from seven mismatches in 
both cases. This contrasts with ITRs of the other HAdV-Bs: 
136 bp for HAdV-3 (Tolun et al., 1979) and HAdV-7 (strain 
Greiner) (Shinagawa & Padmanabhan, 1980); 137 bp for 
HAdV-11 (Mei et al., 2003; Stone et al., 2003) and HAdV-35 
(Kovacs et al., 2004); and 121 bp for SAdV-21 (Davison 
et al., 2003). The ITRs did not contain the consensus 
motif CATCATCAAT found in most other HAdVs (Stone 


et al., 2003). Instead, they ended with the sequence 
CTATCTATAT, as found in HAdV-7. The conserved 
TATAATATACC motif that binds the complex of terminal 


Table 2. Inverted repeats in HAdV-3 strains GZ1 and GZ2 


Inverted repeats in HAdV-3 strains GZ1 and GZ2 of 13 nt or 
longer are shown. An asterisk indicates that the same position is 
found in strain GZ2 as in strain GZ1. — indicates that the strain 
does not include the corresponding repeat. 


Length Sequence Position 
(bp) . a 
Strain GZ1 Strain GZ2 
111 CTATC...CGGGG 1-111 t 
GATAG...GCCCC 35273-35163 35269-35159 
14 CTGAAACTGTTTGG = 2279-2292 
GACTTTGACAAACC = 32436-32423 
13 AAAGCGAAAGTAA 7255-7267 33 
TITCGCTTTCATT 28130-28118 28128-28116 
13 CAGCAACTTCATG 21025-21037 21023-21035 
GTCGTTGAAGTAC 21757-21745 21755-21743 
13 CCTCAATCTCTTC 9196-9208 A 
GGAGTTAGAGAAG 23773-23761 23771-23759 
13 CTGGTAGCCAATG 10133-10145 * 
GACCATCGGTTAC 20734-20722 20732-20720 
13 GAGTTTTGGCTGG 19437—19449 19435-19447 
CTCAAAACCGACC 32837—32825 32833-32821 
13 GCTGCAGCTGCTG 14927-14939 $ 
CGACGTCGACGAC 21291-21279 21289-21277 
13 GGAGGCAAGTCCA 8129—8141 j 
CCTCCGTTCAGGT 18122-18110 18120-18108 
13 TTTCTTCTCCTTC 9622—9634 = 
AAAGAAGAGGAAG 18962-18950 = 
13 TCGGGGTGAAATT = 4252-4264 
AGCCCCACTTTAA = 25554-25542 
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protein precursor (pTP) and DNA polymerase during viral 
DNA replication was present at 8-18 bp (Temperley & Hay, 
1992). The ITRs are critical to virus replication, as well as for 
gene activation and transcription. The transcription factor 
DNA-binding motifs, such as the NFIIT/Oct-1 binding site at 
40-50 bp, Sp1 binding site at 72-79 bp and NFI binding site 
at 26-39 bp, were also identified in the ITR regions of strains 
GZ1 and GZ2. Other inverted repetitive sequences were 
short and their function is not yet known. Inverted repetitive 
sequences are rife in HAdV-3 genomes and perhaps they act 
in the transcription stages. Moreover, inverted repeats were 
also identified in the genomes of most other HAdV-B (data 
not shown). 


Whole-genome comparison 


Genome sequences of the two strains were aligned by using 
PipMaker. Both genomes shared close identity with respect 
to nucleotide sequences. PipMaker analysis suggested a small 
duplication (covering the region of nt 28 600-29 800) that is 
conserved in both strains and minimal differences at the 
gross level (Fig. 2). However, under detailed scrutiny, strain 
GZ2 had 93 mismatches and four gaps compared with 
strain GZ1, which caused typical CPE. The effect of 
mismatches is shown in Table 3. The substitutions in the 
strain GZ2 genome are obvious. Synonymous and non- 
synonymous substitutions resulting from mismatches are 
shown in Table 3. Eighteen proteins had synonymous sub- 
stitutions and 27 proteins had non-synonymous substitu- 
tions, including a ‘C to ‘T’, which shortened the L2 pX 
protein (8-3 kDa in strain GZ1) to 7-6 kDa in GZ2 as a 
result of an internal stop codon (TAG) at nt 17459-17461. 
Of the 27 proteins with non-synonymous substitutions, 
high numbers were found in pTP (six substitutions) and 
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1 35273 


Fig. 2. Whole-genome analyses of the HAdV-3 strain GZ1 and 
GZ2 sequences. The genome sequences of the two strains 
were aligned and analysed. Dot-plot analysis of the aligned 
sequences was displayed by PipMaker; genome duplication 
(covering the region nt 28 600-29 800) is indicated by short 
parallel diagonal lines above and below the main line in the 
upper right portion of the plot. 


hexon (five substitutions). Penton base, DNA Pol, 100 kDa 
and 12-1 kDa proteins also had three or four non-synony- 
mous substitutions. The effects of such non-synonymous 
substitutions are not known. The synonymous substitutions 
may be single-nucleotide polymorphisms, as they did not 
change amino acids. 


E1A and E1B 


The E1A proteins regulate viral and host gene expression by 
interacting with various members of the host-cell tran- 
scription machinery. E1A coding sequences are conserved 
across the various Mastadenovirus species. One substitu- 
tion of Y to F was identified in the E1A 28-5 kDa protein of 
the GZ2 genome. The E1B 55 kDa protein was similar to 
the large T antigen protein, which has been shown to inhibit 
cellular p53-mediated host-defence mechanisms (Yew et al., 
1994). 


E2 


Three proteins required for viral DNA replication have 
been identified in the E2 transcriptional unit. Six non- 
synonymous substitutions were found in the E2B terminal 
protein precursor of strain GZ2, three in DNA polymerase 
protein and two in DNA-binding protein. The high number 
of substitutions in these proteins may potentially affect the 
replication course in the GZ2 genome. 


E3 


The E3 region of HAdVs encodes proteins that are not 
essential for in vitro growth. Both the 16:5 kDa and 
15-3 kDa proteins were similar to adenoviral E3 proteins 
that are known to protect virus-infected cells against TNF- 
induced cytolysis (Horton et al., 1990). Non-synonymous 
substitutions occurred in half of the eight proteins in the 
E3 region, including the 15-3 kDa protein (‘R to Q’ and ‘R 
to C). 


E4 


Unlike the other early transcripts, the proteins encoded by 
the F4 transcription unit have various functions, includ- 
ing viral RNA export and stabilization (Leppard, 1997). Six 
proteins were identified in both strains. Non-synonymous 
substitutions were found in three proteins of both strains. 


IX and IVa2 


Protein IVa2 in the intermediate gene region of HAdV-3 
strain GZ2 had one non-synonymous substitution. Proteins 
IX and IVa2 play a critical role in controlling DNA 
packaging during AdV assembly (Zhang et al., 2001; Sargent 
et al., 2004) and act as transcriptional activators depending 
on the presence of the TATA box upstream for HAdV-3. 


L1 


Protein Ia precursor and the 52/55K protein homologue 
(with a predicted molecular mass of 43:8 kDa) were 
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Table 3. Substitutions caused by mismatches in HAdV-3 strain GZ2 coding sequences 


compared with strain GZ1 


The absence of any substitution (synonymous or non-synonymous) in a protein is indicated by —. 


Region Product Substitution 
Synonymous Non-synonymous 

EIA 28:5 kDa protein = YF 

E1B 21 kDa protein = Q—>R 

IVa2 IVa2 Y>Y R>L 

E2B DNA Pol Y>Y K-E, A>V, A>V 

E2B pIP P—P S—F, E>G, I>T, R-G, E>G, G—>G 

L1 Protein Ia precursor QQ V—L, D>N, A>V 

L2 Penton base protein VV I>V, DG, M->I, GS 

L2 Minor core protein pV — PL 

L2 pX 7:6 kDa protein = Q-stop codon 

L3 Protein VI precursor E R>K 

L3 Hexon RR, V>V Q—>R, A>T, M—>V, YH, T—M 

L3 23-7 kDa protein ToT R>C 

E2A DNA-binding protein K—>K E>G, T—P 

L4 100 kDa protein VV, oI, K>K E->K, GS, K>E 

L4 33 kDa protein H—H Q—>R, HY 

L4 22 kDa protein H—H Q—>R 

L4 pVII LoL, R>R SN, YC 

E3 12-1 kDa protein = A—>S, G>E, V—>D 

E3 16:1 kDa protein PP, V—>V = 

E3 19-3 kDa protein = GV, Gs 

E3 20:5 kDa protein = R>K 

E3 10-3 kDa — R—>C 

E3 15:3 kDa Sos RQ, R>C 

CDS U exon VV, GoG GE, TA 

L5 Fiber = SP, I>M 

E4 ORF6/7 LoL, II A-E 

E4 ORF6 D—D CY 

E4 ORF2 = LF 

E4 ORF1 PP, RoR = 


identified in both strains. Three non-synonymous substitu- 
tions were found in protein pla of strain GZ2. The 52/ 
55 kDa protein acts as a scaffold for the capsid during virus 
assembly (Hasson et al., 1989). 


L2 


Four coding sequences were identified in the L2 regions, 
including the penton base protein III coding sequence. 
Penton base protein contains a conserved Arg—Gly—Asp 
(RGD) sequence and is involved in virus internalization 
through interaction with different host integrins (Wickham 
et al., 1993). The penton base proteins of strains GZ1 and 
GZ2 were 99-7 % identical at the nucleotide level and 99-5 % 
at the amino acid level. There were four non-synonymous 
substitutions in the penton base protein of strain GZ2. The 
effects of these mutations are difficult to predict, as the 
structural and functional domains of the penton base 
protein have yet to be determined. A non-synonymous 
substitution in the pX protein, which has a predicted 


molecular mass of 8-3 kDa in strain GZ1 and 7:6 kDa in 
strain GZ2, gave rise to an internal stop codon (TAG) at nt 
17459-17461; the effects of this substitution are not yet 
known. 


L3 


Three coding sequences were found in the L3 regions: minor 
capsid protein precursor pVI, hexon and 23-7 kDa protease. 
The hexon protein accounted for 83% of the adenovirus 
capsid and is known to be the principal antigenic com- 
ponent that results in protective immunity following natural 
infections. Leucine, asparagine and threonine are the three 
most abundant amino acids in the hexon of all HAdV-B 
(data not shown). A CLUSTAL-based multiple sequence 
alignment revealed seven HVRs (Fig. 3) between the hexons 
of HAdV-3, -5, -7, -11, -16, -21, -34 and -35, and SAdV-21, 
which account for 99 % of the serotype-specific variations 
(Crawford-Miksza & Schnurr, 1996). Most of the antibodies 
against the hexon in an adenovirus infection are directed 
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HAdV-7 121 YNSLAPKGAP NISQWIVI-- ---------- AGEE------ RAVITTTNIF GLASMKGDN- 
SAdV-21 121 TITDNNTENG DE.DEVAEEG EREKQA.Y.. .N.PV. ARA- 
HAdV-35 121 .. PTAAAAGNGE EEH. - AN.PY. AEA- 
HAdV-34 121 TSTG-LVDDG NDDD- G EKARKA. Y.. .N.PYV. ARA- 
HAdV-16 121 . -KDS-- ---------- -- V.A.P.VIG 
HAdV-21 121 ai KKEDG----G SD.. - EEKNL..¥.. .N.PV.AEGG 


HAdV-11 121.. 
HAGV=3-1 121 wo. sees cee seen cess 
HAdV-3-2 121 .......... 
HAdV-5 121 ..a....... 


RNTTG---EE HVT.------ EETN. .. Y.. .N.PV.AEA- 


. PCE. DEAAT 


HAdV-7 161 -ITK-EGLEI GKDITADN-- 
SAdV-21 180 E...-...P. 


- LEVPSEG-D 

. -LE.S.B.-E 8. 
-LEVSTEG-~ P... 

I.8.8GT-- 


. ---GRV.Y.. ..N...L.SQ KT----GLK. 
S-S.KI. Y.. ......NSSQ RI----N... 
.KED DGTNNI.Y. N 


. TAAT. PNY.. . LÀ... 
PONQOV. T.. ..N... 
P-NORV. Y. 


sar aeai ayei6 .D 
..EN...G I------|! LVK QQNGKL.SQV E.Q..STT.. TAGNGDNLT. 
HW HVRS 
HAdV-7 272 EIVLYTENVN LETPDSHVVY KPGTSDDNSH ANLCQQAMPN RPNYIGFION FVGLMYYNST 
M.AL..D EEN p E vauets E E E E E S: OEA 
MA.. 
MA 
Me. 
M. 
M. 


HAdV-3-1 
HAdV-3-2 
HAdV-5 


HAdV-7 392 NHGIEDELPN YCFPLDGIG- PAKTYQGIKS K--- 
SAV=21) 408 Tonnou osencne V.P RIDS.K..ET NGD----~ EN TT.KD-~LDP NGIS.L.K.. 


---D NGWEKD-DNVY SKSNEIAIGN 


v. 

HAdV-85 404 nW -.-.----- V .TTS.KS.UP NGER----- DN .N.KE--PE. NGTS.. CQ.. 
HAdV-34 404 ...¥...... -.-.-.- V.P QTDS.KE..P NGD----- Q- ST. TN--VDP NG.SOL.K.. 
HAdV-16 386 ...¥.. -----N.V.- FTD....V.V .TDAVAGTSG TQ.D..DTT. .TA...HG.. 
HAdV-21 402 ...¥. V.V .ISS.KI. EP NGQ------ G AD.KE--PDI NGTS.. CQ.. 
HAdV-11 400 .. -V .TTS.KS.VP NGD-----NA PN.KE--PE. NGTS.. CQ.. 
HAdV-3-1 396 . . - .GHR..... Y .TD-----DA ......-A.. DTA... 2. 
HAdV-3-2 396 .. ES - .GHR..... YV .TD----- DA ......-A.. DTA....-.- 
HAdV-5 4083 ...T...... menni G.VI- NTE.LTKV.P .TG-----QR ......ATRF .DK...BV.. 


MYR? 


Fig. 3. Multiple sequence alignment of the hexon proteins of 
HAdV-3, -5, -7, -11, -16, -21, -34 and -35, and SAdV-21; 
HAdV-3-1 and HAdV-3-2 correspond to strains GZ1 and GZ2, 
respectively. CLUSTAL_W alignment of the amino acid sequences 
of the hexons reveals seven major hypervariable regions 
(HVR1-HVR7). Dots, conserved amino acids; dashes, gaps. 


against epitopes within these seven HVRs. A comparison 
of the hexon coding sequences from the strain GZ1 and 
GZ2 genomes identified two synonymous and five non- 
synonymous substitutions (Table 3) and they were 99 % 
identical at the amino acid level. Interestingly, two 
synonymous substitutions in strain GZ2 occurred within 
the HVRs, whereas the five non-synonymous substitutions 
were found in the conserved regions of the hexon coding 
sequences. The seven HVRs contained >99% of hexon 
serotype-specific residues. Both strains belonged to HAdV-3, 
so, in these regions, a complete identity between the two 
strains is not unexpected. On the other hand, the hexon 
epitopes are known to be conformational (Crawford-Miksza 
& Schnurr, 1996). Therefore, a change in a structural region, 
such as the M221V substitution in the conserved region of 


the L1 loop between HVR3 and HVR4, may affect protein 
folding in the antigenic regions to a certain extent. 


L4 


Four coding sequences were identified, corresponding to the 
100 kDa, 22 kDa, 33 kDa and pVIII proteins. The 100 kDa 
non-structural protein is involved in hexon assembly 
(Oosterom-Dragon & Ginsberg, 1981), selective activation 
of late viral protein synthesis (Hayes et al., 1990) and 
inhibition of granzyme B-mediated lysis (Andrade et al., 
2001). Protein VIII is associated with the formation of a 
possible link between the hexon capsomere and core capsid 
components (Shenk, 2001). In the 100 kDa protein of strain 
GZ2, three synonymous and three non-synonymous sub- 
stitutions were identified. 


L5 


The adenovirus fiber protrudes from the vertices of the 
capsid, is responsible for the virus binding to host cells and 
is a major determinant of tissue tropism. The fiber coding 
sequences of the two strains were 99 % identical at the amino 
acid level, with two non-synonymous substitutions. The 
substitutions resulted in the amino acid changes S10P and 
1286M, which occurred in the fiber ‘tail’ and ‘knob’, respec- 
tively. Unlike members of other HAdV groups, members of 
HAdV-B do not bind the CAR (Defer et al., 1990). 


Conclusion 


The complete genomes of HAdV-3 strains GZ1 and GZ2 
have been sequenced and annotated. The difference in CPE 
caused by the two strains was analysed at the genome level. 
Based on bioinformatic analyses, non-synonymous sub- 
stitutions in the E2 terminal protein precursor, DNA 
polymerase protein and DNA-binding protein of strain GZ2 
were identified, which may potentially affect the replication 
course in the strain GZ2 genome. Two non-synonymous 
substitutions were also identified in the GZ2 E3 15:3 kDa 
protein, which is similar to the proteins that protect virus- 
infected cells against TNF-induced cytolysis. In the conserved 
regions of the hexon coding sequences, five non-synonymous 
substitutions were found. The differential CPEs induced by 
the two strains must be caused by the genome differences, 
although this has not yet been defined precisely. Both children 
infected with the adenovirus strains exhibited overt disease: 
the child infected with GZ1 had fever and bronchitis, whereas 
the child with GZ2 had pharyngeal conjunctivitis. Thus, 
although the viruses possessed different growth character- 
istics in vitro, they were both virulent. Finally, as it has a 
different tropism from CAR-interacting HAdVs, HAdV-3 
has the potential to be developed as an alternative gene- 
transfer vector to HAdV-5. 
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